arXiv: 1502.0345lvl [cs.IT] 11 Feb 2015 


Cornerstones of Sampling of Operator 
Theory 


David Walnut, Gotz E. Pfander, Thomas Kailath 


Abstract This paper reviews some results on the identifiability of classes 
of operators whose Kohn-Nirenberg symbols are band-limited (called band- 
limited operators ), which we refer to as sampling of operators. We trace the 
motivation and history of the subject back to the original work of the third- 
named author in the late 1950s and early 1960s, and to the innovations in 
spread-spectrum communications that preceded that work. We give a brief 
overview of the NOMAC (Noise Modulation and Correlation) and Rake re¬ 
ceivers, which were early implementations of spread-spectrum multi-path 
wireless communication systems. We examine in detail the original proof 
of the third-named author characterizing identifiability of channels in terms 
of the maximum time and Doppler spread of the channel, and do the same 
for the subsequent generalization of that work by Bello. The mathematical 
limitations inherent in the proofs of Bello and the third author are removed 
by using mathematical tools unavailable at the time. We survey more recent 
advances in sampling of operators and discuss the implications of the use of 
periodically-weighted delta-trains as identifiers for operator classes that sat¬ 
isfy Bello’s criterion for identifiability, leading to new insights into the theory 
of finite-dimensional Gabor systems. We present novel results on operator 
sampling in higher dimensions, and review implications and generalizations 
of the results to stochastic operators, MIMO systems, and operators with 
unknown spreading domains. 
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1 Introduction 


The problem of identification of a time-variant communication channel arose 
in the 1950s as the problem of secure long-range wireless communications 
became increasingly important due to the geopolitical situation at the time. 
Some of the theoretical and practical advances made then are described in 
this paper, and more recent advances extending the theory to more general 
operators, and onto a more rigorous mathematical footing, known as sampling 
of operators are developed here as well. 

The launching point for the theory of operator sampling is the early work 
of the third-named author in his Master’s thesis at MIT, entitled “Sampling 
models for linear time-variant filters” ED, see also [ 23 } HBj , and [23] in which 
he reviews the identification problem for time-variant channels. The third 
named author as well as Bello in subsequent work [5] were attempting to 
understand and describe the theoretical limits of identifiability of time-variant 
communication channels. Section[2]of this paper describes in some detail their 
work and explores some of the important mathematical challenges they faced. 
In Section [3[ we describe the more recently developed framework of operator 
sampling. Results addressing the problem considered by Bello are based on 
insights on finite dimensional Gabor systems which are presented in Section]!] 
Malikiosis’s recent result [34] allows for the generalization of those results to 
a higher-dimensional setting, these are stated and proven in Section [5] We 
conclude the paper in Section [6] with a short summary of the sampling of 
operators literature, that is, of results presented in detail elsewhere. 


2 Historical Remarks. 

2.1 The Cold War Origins of the Rake System. 


In 1958, Price and Green published A Communication Technique for Multi- 
path Channels in Proc. IRE 152 . in which they describe a communication 
system called Rake, designed to solve the multi-path problem. When a wire¬ 
less transmitter does not have line-of-sight with the receiver, the transmitted 
signal is reflected possibly multiple times before reaching the receiver. Reflec¬ 
tion by stationary objects such as the ground or buildings introduces random 
time delays to the signal, and reflection or refraction by moving objects such 
as clouds, the troposphere, ionosphere, or a moving vehicle produce random 
frequency or Doppler shifts in the signal as well. Due to scattering and ab¬ 
sorption, the reflected signals are randomly amplitude-attenuated too. The 
problem is to recover the transmitted signal as accurately as possible from the 
superposition of time-frequency-shifted and randomly amplitude-attenuated 
versions of it. Since the location and velocities of the reflecting objects change 
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with time, the effects of the unknown, time-variant channel must be estimated 
and compensated for. 

Price and Green’s paper m was the disclosure in the literature of a long¬ 
distance system of wide-band or spread-spectrum communications that had 
been developed in response to strategic needs related to the Cold War. This 
fascinating story has been described in several articles by those directly in¬ 
volved ((63] [64j [59] IT2] 1. We present a summary of those remarks and of 
the Rake system below. The goal is to motivate the original work of the 
third-named author on which the theory of operator sampling is based. 

In the years following World War II, the Soviet Union was exercising its 
power in Eastern Europe with a major point of contention being Berlin, which 
the Soviets blockaded in the late 1940s. This made secure communication with 
Berlin a top priority. As Paul Green describes it, 

[T]he Battle of Berlin was raging, the Russians having isolated the city physically 
on land, so that the Berlin Airlift was resorted to, and nobody knew when all the 
communication links would begin to be subjected to heavy Soviet jamming. m 

By 1950, with a shooting war in Korea about to begin, the Army Signal 
Corps approached researchers at MIT to develop secure, and reliable wireless 
communication with the opposite ends of the world. According to Green, 

It is difficult today to recall the fearful excitement of those times. The Russians 
were thought to be 12 feet high in anything having to do with applying math¬ 
ematics to communication problems (“all Russians were Komogorovs or Kotel- 
nikovs”)....[T]here was a huge backlog of unexploited theory lying around, and peo¬ 
ple were beginning to build digital equipment with the unheard of complexity of a 
hundred or so vacuum tube-based bits (!). And the money flowed. m 

The effort was called Project Lincoln (precursor to Lincoln Laboratory). The 
researchers were confronted by two main problems: 1) making a communica¬ 
tions system robust to noise and deliberate jamming, and 2) enabling good 
signal recovery from multiple paths. 


2.2 Spread Spectrum communications and NOMAC 

The technique chosen to address the first problem is an application of the 
notion, already well-understood and used by that time, that combatting dis¬ 
tortions from noise and jamming can be achieved by spreading the signal over 
a wide frequency band. The idea of spreading the spectrum had been around 
for a long time [5911691 l56j and can be found even in a now famous Hedy 
Lamarr-George Antheil patent of 1942 {35J [55], which introduced the concept 
later called “frequency hopping”. The system called NOMAC (Noise Mod¬ 
ulation and Correlation) was developed in the early 1950s and used noise 
like (pseudo-noise or PN) signals to achieve spectrum spreading. Detailed 
discussion of its history can be found in mmm- 
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The huge backlog of “unexploited theory” mentioned above included the 
recent work of Claude Shannon on communication theory m, of Norbert 
Wiener on correlation functions and least mean squares prediction and fil¬ 
tering EL and recent applications of statistical decision theory to detection 
problems in radar and communications. 

The communication problem addressed by NOMAC was to encode data 
represented by a string of ones and zeros into analog signals that could be 
electromagnetically transmitted over a noisy communication channel in a 
way that foiled “jamming” by enemies. The analog signals oq(-) and xo(-), 
commonly called Mark and Space, associated with the data digits 1 and 
0, were chosen to be waveforms of approximate bandwidth B, and with 
small cross correlation. The target application was 60wpm teletype, with 
22 msec per digit (called a baud), which corresponds to a transmission rate 
of 1/0.022 sec = 45 Hz. The transmitted signals were chosen to have a band¬ 
width of 10 KHz, which was therefore expected to yield a “jamming sup¬ 
pression ration” of 10,000/45 = 220, or 23db ['Q, TO] , The jamming ratio 
is often called the “correlation gain”, because the receiver structure involves 
cross correlation of the received signal with each of the possible transmitted 
signals. If the correlation with the signal 2q(-) is larger than the one with 
the signal xq(-), then it is decided that the transmitted signal corresponded 
to the digit 1. This scheme can be shown to be optimum in the sense of 
minimum probability of error provided that the transmitted signals are not 
distorted by the communications channel and that the receiver noise is white 
Gaussian noise (see, for example, El)- The protection against jamming is 
because unless the jammer has good knowledge of the noise like transmitted 
signals, any jamming signals would just appear as additional noise at the 
output of the crosscorrelations. 

More details on the nontrivial ideas required for building a practical system 
can be found in the references. We may mention that the key ideas arose from 
three classified MIT dissertations by Basore [5], Pankowski [35], and Green 
m> in fact, documents on NOMAC remained classified until 1961 [T31 . 

A transcontinental experiment was run on a NOMAC system, but was 
found to have very poor performance because of the presence of multiple 
paths; the signals arriving at the receiver by these different paths sometimes 
interfere destructively. This is the phenomenon of “fading”, which causes self 
jamming of the system. Some improvement was achieved by adding additional 
circuitry and the receiver to separately identify and track the two strongest 
signals and combine them after phase correction; this use of time and space 
diversity enabled a correlation gain of 17 db, 6 db short of the expected perfor¬ 
mance. It was determined that this loss was because of the neglected weaker 
paths, of which there could be as many as 20 or 30. So attention turned to a 
system that would allow the use of all the different paths. 
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2.3 The Rake system 

One conceptual basis for this new system was provided by the doctoral thesis 
of Robert Price [60] . the main results of which were published in 1956 [51] . 
In a channel with severe multi-path the signal at the receiver is composed of 
large number of signals of different amplitudes and phases and so Price made 
the assumption that the received “signal” was a Gaussian random process. 
He studied the problem of choosing between the hypothesis 

Hi : w(-) = Axi(-) + n(-), i = 0,1, 

where the random time variant linear communication channel A is such that 
the {Axi(-)} are Gaussian processes. In this case, the earlier cross correla¬ 
tion detection scheme makes no sense, because the “signal” arriving at the 
receiver is not deterministic but is a sample function of a random process, 
which is not available to the receiver because it is corrupted by the additive 
noise. Price worked out the optimum detection scheme and then ingeniously 
interpreted the mathematical formulas to conclude that the new receiver 
forms least mean-square estimates of the {Axi(-)} and then crosscorrelates 
the w(-) against these estimates. In practice of course, one does not have 
enough statistical information to form these estimates and therefore more 
heuristic estimates are used and this was done in the actual system that was 
built. The main heuristic, from Wiener’s least mean-square smoothing filter 
solution and earlier insights, is that one should give greater weight to paths 
with higher signal-to-noise ratio. 

So Price and Green devised a new receiver structure comprised of a delay 
line of length 3 ms intervals (the maximum expected time spread in their 
channel), with 30 taps spaced every 1/10Khz, or 100/us. This would enable 
the capture of all the multi-path signals in the channel. Then the tap gains 
were made proportional to the strength of the signal received at that tap. 
Since a Mark/Space decision was only needed every 22 ms (for the transmis¬ 
sion rate of 60 wpm), and since the fading rate of the channel was slow enough 
that the channel characteristics remain constant over even longer than 22 ms, 
tap gains could be averaged over several 3 ms intervals. The new system was 
called “Rake”, because the delay line structure resembled that in a typical 
garden rake! 

Trials showed that this scheme worked well enough to recover the 6 db loss 
experienced by the NOMAC system. The system was put into production and 
was successfully used for jam-proof communications between Washington DC 
and Berlin during the “Berlin crisis” in the early 60s. 

HF communications is no longer very significant, but the Rake receiver 
has found application in a variety of problems such as sonar, the detection of 
underground nuclear explosions, and planetary radar astronomy (pioneered 
by Price and Green, [III EH]) and currently it is much used in mobile wireless 
communications. It is interesting to note that the eight racks of equipment 
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needed to build the Rake system in the 1960s is now captured in a small 
integrated circuit chip in a smart phone! 

However the fact that the Rake system did not perform satisfactorily when 
the fading rates of the communication channel were not very slow led MIT 
professor John Wozencraft, (who had been part of the Rake project team at 
Lincoln Lab) to suggest in 1957 (even before the open 1958 publication of the 
Rake system) to his new graduate student Thomas Kailath a fundamental 
study of linear time-variant communication channels and their identifiability 
for his Masters thesis. While linear time-variant linear systems had begun to 
be studied at least as early as 1950 (notably by Zadeh m , in communica¬ 
tion systems there are certain additional constraints, notably limits on the 
bandwidths of the input signal and the duration of the channel memory. So 
a more detailed study was deemed to be worthwhile. 


2.4 Kailath’s Time-Variant Channel Identification 
Condition 


In the paper m, the author considers the problem of measuring a channel 
whose characteristics vary rapidly with time. He considers the dependence 
of any theoretical channel estimation scheme on how rapidly the channel 
characteristics change and concludes that there are theoretical limits on the 
ability to identify a rapidly changing channel. He models the channel A as a 
linear time-variant filter and defines 


A(\,t) = response of A, measured at time t to a unit impulse input at time t — A. 

A(A, t) is one form of the time-variant impulse response of the linear chan¬ 
nel that emphasizes the role of the “age” variable A. The channel response 
to an input signal x(-) is 



A(A, t) x(t — A) dX. 


An impulse response A(A, t) = A(A) represents a time-invariant filter. Fur¬ 
ther, the author states 

Therefore the rate of variation of A(A, t) with t , for fixed A, is a measure of the rate 
of variation of the filter. It is convenient to measure this variation in the frequency 
domain by defining a function A 

roo 

A(\,f)= / A(\,t)e~ 2nift dt 


Then he defines 

B = max[6 — a, where M(A, /) = 0 for / ^ [a, b\ ]. 
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While symmetric support is assumed in the paper, this definition makes clear 
that non-rectangular regions of support are already in view. Additionally, he 
defines the memory as the maximum time-delay spread in response to an 
impulse of the channel as 

L = max[min such that A(X,t) =0, A > A 7 ]. 

In short, the assumption in the continuation of the paper is that 
supp A(A, /) C [0, L] x [~W,W\ 

where W = B /2. The function Al(A, /) is often called the spreading function 
of the channel. He then asks under what assumptions on L and B = 2 W 
can such a channel be measured? In the context of the Rake system, this 
translates to the question of whether there are limits on the rate of variation 
of the filter that can assure that the measurement filter can be presumed to 
be effective. 

The author’s assertion is that as long as BL < 1, then a “simple measure¬ 
ment scheme” is sufficient. 

We have assumed that the bandwidth of any “tap function”, A^f) [= A(A, •)] , is 
limited to a frequency region of width B, say a low-pass region (— W. W) for which 
B = 2 W. Such band-limited taps are determined according to the Sampling theorem, 
by their values at the instants i/2W , i = 0, ±1, ±2,.... 

If the memory, L, of the filter, A(X,t) is less than 1/2 W these values are easily 
determined: we put in unit impulses to A(\,t) at instants 0, 1 /21T. 2/2 W ,... ,T, 
and read off from the responses the desired values of the impulse response A(\,t). 

[...] If L < 1/2 W, the responses to the different input impulses do not interfere with 
one another and the above values can be unambiguously determined. 

In other words, sufficiently dense samples of the tap functions can be 
obtained by sending an impulse train 5 n /2w through the channel. Indeed, 

n/2w)(t) ~ E / A(A, t) S n / 2 w(t - A) dX = E ~ n/2W , t). 

n n n 

Evaluating the operator response at t = Ao + no/2W , no £ Z, we obtain 
A( E s n/ 2 w) (Ao + n 0 /2W) = E ^(A 0 + (no - n)/2W, X 0 + n 0 /2W) 

n n 

= Al(Ao) Ao + uq/2W) 

since L < 1/2 W implies that A(Ao + (no — n)/2W, Ao + no/2H / ) = 0 if n ^ no. 
In short, for each A, the samples A(A, A + n/2W) for n € Z can be recovered. 

The described Kailath sounding procedure is depicted in Figure [T] In this 
visualization, we plot the kernel n(s,t) = A(t — s,t) of the operator A, that 
is, 
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Ax(t) = J A(\,t)x(t — X) d\ = J A(t — s,t) x(s) ds = J n{t,s)x{s) ds. 



Fig. 1: Kailath sounding of A with supp„4(A,/) C [0, L] x [— W, W] and 
L = 1/2 W. The kernel n(t,s) is displayed on the (t,s) plane, the im¬ 
pulse train J2 n ^n/2w( s ) on the s-axis, and the output signal Ax(t) = 
A (Y J n 5 n/2w){t) = T,n A ( t ~ n / 2W ^ t ) = En ^ > n/2W). The sample values 
of the tab functions A\(t) = A(\,t) = n(t,t — A) can be read off Ax(t). 


2.5 Necessity of Kailath’s Condition for Channel 
Identification. 

For the “simple measurement scheme” to work, BL < 1 is sufficient but could 
be restrictive. 

We need, therefore, to devise more sophisticated measurement schemes. However, 
we have not pursued this question very far because for a certain class of channels 
we can show that the condition 

L < 1/2W, i.e. , BL < 1 

is necessary as well as sufficient for unambiguous measurement of A(X,t). The class 
of channels is obtained as follows: We first assume that there is a bandwidth con¬ 
straint on the possible input signals to in that the signals are restricted 

to in frequency. We can now determine a filter Awi(X,t) that is equiv¬ 

alent to A(X,t) over the bandwidth (— Wi,Wi), and find necessary and sufficient 
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conditions for unambiguous measurement of Aw,(A,t). If we now let IT, —» oo, this 
condition reduces to condition ( 1 ), viz: L < 1/2 W. Therefore, condition ( 1 ) is valid 
for all filters A( A, t) that may be obtained as the limit of band-limited channels. 
This class includes almost all filters of physical interest. The argument is worked 
out in detail in Ref. 6 0 but we give a brief outline here. 

The class of operators in view here can be described as limits (in some 
unspecified sense) of operators whose impulse response A(X,t) is bandlimited 
to [— Wi, Wi] in A for each t and periodic with period T > 0 in t for each A. 
Here, T is assumed to have some value larger than the maximum time over 
which the channel will be operated. We could take it as the duration of the 
input signal to the channel. 

The restriction to input signals bandlimited (—Wj,Wj) indicates that it 
suffices to know the values of A(A, t) or A(X, f) for a finite set of values of A: 
A = 0, 1/2 Wi, 2/2 Wi, ..., L , assuming for simplicity that L is a multiple of 
1/2 Wi. Therefore, we can write 

A(X,t) = ^ A{n/2Wi,t) sinew,(A — n/2Wi), 

n 

where sinew, (0 = sin(27rWji)/(27rWjt) so that as Wi — ► oo, sinew, (t) be¬ 
comes more concentrated at the origin. 

Also, T-periodicity in t allows us to write 

A( A, t) = J2 -A(A, k/T) e 2 * ikt / T , 

k 


so that combining gives 

A(A, t) = EE A(n/2Wi,k/T ) sinew, (A - n/2W l ) e 2 ” fct / T . 

n k 

Based on the restriction to bandlimited input signals which are T periodic, 
we have obtained a representation of A which is neither compactly supported 
in A nor bandlimited in t. However, the original restriction that 

supp A(A, /) C [0, L] x [~W,W] 

motivates the assumption that we are working with finite sums, viz. 

A(X,t) = E E A{n/2Wi, k/T) sinew, (A - n/2W/) e 2 ™ kt / T . 
n/2Wie[a,L\ k/te[-w,w] 

This is how the author obtains the estimate that there are at most (2 WiL + 
1)(2 WT + 1) degrees of freedom in any impulse response A in the given class. 

For any input signal x(t) bandlimited to [— W,,Wj], the output will be 
bandlimited to [-W — Wj, W + Wi]. Specifically, 


1 Ref. 6 is Up. 
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Ax(t) = J A(\,t) x(t — A) dX 

- ' E E A(n/2Wi,k/T) e 2 ™ kt/T 

n/2WiG[0,L] k/te[-W,W] 

J x(t — A) sinc^ (A — n/2Wi ) d\ 

- E E A(n/2Wi,k/T) e 2 ™ kt/T 

n/2Wj£[0,L] k/te[-W,W] 

(. x * sincwj(i — n/2Wi). 

Since e 2mkt ! T (x*sinc\Vi){t — n/2Wi) is bandlimited to [—Wi, Wi] + (k/T) for 
k/T € [-W, W], it follows that Ax(t) is bandlimited to [—W — W, W + Wi]. 

If we restrict our attention to signals x(t) time-limited to [0, T], the output 
signal Ax(t) will have duration T+L, and Ax{-) will be completely determined 
by its samples at 2 (w+w ) e [0,T + L], from which we can identify 2{T + 
L)(W + Wi) + 1 degrees of freedom. 

In order for identification to be possible, the number of degrees of freedom 
of the output signal must be at least as large as the number of degrees of 
freedom of the operator, i.e. 

2 WiT + 2 WiL + 2WT + 2WL + 1 = 

2(T + L)(Wi + W) + 1 > (2 WT + 1)(2 W t L + 1) 

= 2 WT + 2WiL + 1 + AWtWTL 


which reduces ultimately to 


1 

1 - l/(2WiT) 


> 2WL = BL. 


That is, BL needs to be strictly smaller than 1 in the approximation while 
BL = 1 may work in the limiting case Wi —> oo (and/or T —> oo). 

This result got a lot of attention because it corresponded with experimen¬ 
tal evidence that Rake did not function well when the condition BL < 1 
was violated. It led to the designation of “underspread” and “overspread” 
channels for which BL was less than or greater than 1. 


2.6 Some Remarks on Kailath’s Results 


This simple argument is surprising, particularly in light of the fact that the 
author obtained a deep result in time-frequency analysis with none of the 
tools of modern time-frequency analysis at his disposal. He very deftly uses 
the extremely useful engineering “fiction” that the dimension of the space 
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of signals essentially bandlimited to [— W, W] and time-limited to [0, T] is 
approximately 2 WT. The then recent papers of Landau, Slepian and Poliak 
HUGH] , which are mentioned explicitly in EH provided a rigorous mathemat¬ 
ical framework for understanding the phenomenon of essentially simultaneous 
band- and time-limiting. While the existence of these results lent consider¬ 
able mathematical heft to the argument, they were not incorporated into a 
fully airtight mathematical proof of his theorem. 

In the proof we have used a degrees-of-freedom argument based on the sampling the¬ 
orem which assumes strictly bandlimited functions. This is an unrealistic assumption 
for physical processes. It is more reasonable to call a process band (or time) limited 
if some large fraction of its energy, say 95%, is contained within a finite frequency 
(or time) region. Recent work by Landau and Slepian has shown the concept of 
approximately 2 TW degrees of freedom holds even in such cases. This leads us to 
believe that our proof of the necessity of the BL < 1 condition is not merely a 
consequence of the special properties of strictly band-limited functions. It would be 
valuable to find an alternative method of proof. 

While Kailath’s Theorem is stated for channel operators whose spreading 
functions are supported in a rectangle, it is clear that the later work of Bello 
[5] was anticipated and more general regions were in view. This is stated 
explicitly. 

We have not discussed how the bandwidth, B is to be defined. There are several 
possibilities: we might take the nonzero /-region of *4(A, /); or use amounting” 
argument. We could proceed similarly for the definition of L. As a result of these 
several possibilities, the value 1, of the threshold in the condition BL < 1 should be 
considered only as an order of magnitude value. 

...constant and predictable variations in B and L, due for example to known Doppler 
shifts or time displacements, would yield large values for the absolute values of 
the time and frequency spreadings. However such predictable variations should be 
subtracted out before the BL product is computed; what appears to be important is 
the area covered in the time- and frequency-spreading plane rather than the absolute 
values of B and L . (emphasis added) 

The reference to “counting” as a definition of bandwidth clearly indicates 
that essentially arbitrary regions of support for the operator spreading func¬ 
tion were in view here, and that a necessity argument relying on degrees 
of freedom and not the shape of the spreading region was anticipated. The 
third-named author did not pursue the measurement problem studied in his 
MS thesis because he went on in his PhD dissertation to study the optimum 
(in the sense of minimum probability of error) detector scheme of which Rake 
is an intelligent engineering approximation. See [52112312SI ■ 

The mathematical limitations of the necessity proof in m can be removed 
by addressing the identification problem directly as a problem on infinite¬ 
dimensional space rather than relying on finite-dimensional approximations 
to the channel. This approach also avoids the problem of dealing with simul¬ 
taneously time and frequency-limited functions. In this way, the proof can 
be made completely mathematically rigorous. This approach is described in 
Section 13.21 
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2.1 Bello’s time-variant Channel Identification 
Condition 

Kailath’s Theorem was generalized by Bello in j3] along the lines anticipated 
in [21]. Bello’s argument follows that of [21] in its broad outlines but with 
some significant differences. Bello clearly anticipates some of the technical 
difficulties that have been solved more recently by the authors and others 
and which have led to the general theory of operator sampling. 

Continuing with the notation of this section, Bello considers channels with 
spreading function A(\, f) supported in a rectangle [0, L\ x [— W, W], If L 
and W are all that is known about the channel, then Kailath’s criterion for 
measurability requires that 2WL < 1. Bello considers channels for which 
2 WL may be greater than 1 but for which 

S A = |suppAl(A,/)| < 1 

and argues that this is the most appropriate criterion to assess measurability 
of the channel modeled by A. 

In order to describe Bello’s proof we will fix parameters T>L and Wi 
W and following the assumptions earlier in this section, assume that inputs 
to the channel are time-limited to [0, T] and (approximately) bandlimited to 
[— Wi, Wi}. Under this assumption, Bello considers the spreading function of 
the channel to be approximated by a superposition of point scatterers, viz. 

,/) = EE A n ,k S(f ~ {k/T))S(X — (n/2Wi)). 

n k 

Hence the response of the channel to an input x(-) is given by 

Ax(t) = jj x(t-X) e 2 " /( ‘" A U(A, /) dX df (1) 

= £E^**(‘-( n/2Wi)) e 2 "( fc / T !! t- ( n / 2M/ d)_ 

n k 

Note that this is a continuous-time Gabor expansion with window function 
x(-) (see, e.g., na). By standard density results in Gabor theory, the col¬ 
lection of functions {x(t — (n/2W;)) e 2 ’ r *( fc / r )( i -( n / 2 * t 'i))} is overcomplete as 
soon as 2 TWt > 1. Consequently, without further discretization, the coeffi¬ 
cients A n k are in principle unrecoverable. Taking into consideration support 
constraints on A, we assume that the sums are finite, viz. 

{m f ) e s,ippA 

Hence determining the channel characteristics amounts to finding A n ^ for 
those pairs ( n,k ). It should be noted that for a given spreading function 
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A(X,f) for which supp.4 is a Lebesgue measurable set, given e > 0, there 
exist T and Wj sufficiently large that the number of such (n, k) is no more 
than 2SAWiT(l + e). On the other hand, for a given T and W.-, . there exist 
spreading functions „4(A, /) with arbitrarily small non-convex Sa for which 
the number of nonzero coefficients A n ^ can be large. For example, given T 
and Wi, Sa could consist of rectangles centered on the points (n/(2Wj), k/T) 
with arbitrarily small total area. 

By sampling, ([!]) reduces to a discrete, bi-infinite linear system, viz. 


Ax 



- mZ" 4 " 


,kX 


P~n\ ^ 27 ri*(f 
2Wi 


( 2 ) 


forp £ Z. Note that © is the expansion of a vector in a discrete Gabor system 
on £ 2 (Z), a fact not mentioned by Bello, and of which he was apparently 
unaware. Specifically, defining the translation operator T and the modulation 
operator M. on £ 2 by 

Tx{n) = x(n — 1), and Mx(n) = e nm ^ TWi ^ x(n) , (3) 

Q can be rewritten as 

Ax (w-) = ( 4 ) 

' % ' n k 


Since there are only finitely many nonzero unknowns in this system, Bello’s 
analysis proceeds by looking at finite sections of Q and counting degrees of 
freedom. 

Necessity. Following the lines of the necessity argument in HU , we note 
that there are at least 2 (T + L)(W + Wi) degrees of freedom in the output 
vector Ax{t ), that is, at least that many independent samples of the form 
Ax(jp/2Wi), and as observed above, no more than 2SAWiT(l + e) nonzero 
unknowns A n ^. Therefore, in order for the A n ^ to be determined in principle, 
it must be true that 


2WiT(l + e)S A < 2(T + L)(W + W z ) 


^ {T + L)(W + Wj) 

A ~ WiT( 1 + e) ‘ 

Letting T, Wi —> oo and e —¥ 0, we arrive at Sa < 1. 

Sufficiency. Considering a section of the system based on the assumption 
that suppAl C [0, L] x [— W, W], the system has approximately 2 Wi(T + 
L) equations in (2Wi.T)(2WL) unknowns. Since L and 2 W are simply the 
dimensions of a rectangle that encloses the support of A, 2WL may be quite 
large and independent of S A - Hence the system will not in general be solvable. 
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However by assuming that Sa < 1, only approximately SaC^WiT) of the 
An k do not vanish and the system reduces to one in which the number of 
equations is roughly equal to the number of unknowns. In this case it would 
be possible to solve Q as long as the collection of appropriately truncated 
vectors {T n M k x\ A n ^ 7 ^ 0} forms a linearly independent set for some vector 
x. 

In his paper, Bello was dealing with independence properties of discrete 
Gabor systems apparently without realizing it, or at least without stating it 
explicitly. Indeed, he argues in several different ways that a vector x that pro¬ 
duces a linearly independent set should exist, and intriguingly suggests that 
a vector consisting of ±1 should exist with the property that the Grammian 
of the Gabor matrix corresponding to the section of Q being considered is 
diagonally dominant. 

The setup chosen below to prove Bello’s assertion leads to the consideration 
of a matrix whose columns stem from a Gabor system on a finite-dimensional 
space, not on a sequence space. 


3 Operator Sampling 


The first key contribution of operator sampling is the use of frame theory 
and time-frequency analysis to remove assumptions of simultaneous band- 
and time-limiting, and also to deal with the infinite number of degrees of 


freedom in a functional analytic setting (Section 3.1). A second key insight 


is the development of a “simple measurement scheme” of the type used by 
the third-named author but that allows for the difficulties identified by Bello 
to be resolved. This insight is the use of periodically-weighted delta-trains 
as measurement functions for a channel. Such measurement functions have 
three distinct advantages. 

First, they allow for the channel model to be essentially arbitrary and clar¬ 
ify the reduction of the operator identification problem to a finite-dimensional 
setting without imposing a finite dimensional model that approximates the 
channel. Second, it combines the naturalness of the simple measurement 
scheme described earlier with the flexibility of Bello’s idea for measuring 
channels with arbitrary spreading support. Third, it establishes a connec¬ 
tion between identification of channels and finite-dimensional Gabor systems 
and allows us to determine windowing vectors with appropriate independence 
properties. 

In Section [3Jj we introduce some operator-theoretic descriptions of some 
of the operator classes that we are able to identify, and discuss briefly different 
ways of representing such operators. Such a discussion is beneficial in several 
ways. First, it contains a precise definition of identifiability, which comes into 
play when considering the generalization of the necessity condition for so- 
called overspread channels (Section [T2|. Second, we can extend the necessity 
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condition to a very large class of inputs. In other words, we can assert that 
in a very general sense, no input can identify an overspread channel. Third, 
it allows us to include both convolution operators and multiplication oper¬ 
ators (for which the spreading functions are distributions) in the operator 
sampling theory. The identification of multiplication operators via operator 
sampling reduces to the classical sampling formula, thereby showing that 
classical sampling is a special case of operator sampling. In Section |3.2| we 
present a natural formalization of the original necessity proof of [2T] (Sec¬ 
tion 2.5) to the infinite-dimensional setting, which involves an interpretation 


of the notion of an under-determined system to that setting. Finally, in Sec¬ 
tion 3.3 we present the scheme given first in (42] [49] for the identification of 


operator classes using periodically-weighted delta trains and techniques from 
modern time-frequency analysis. 


3.1 Operator classes and operator identification 


We formally consider an arbitrary operator as a pseudodifferential operator 
represented by 


Hf(x) = j a H (x,£)me 2 ™tdli, (5) 

where 07 /( 2 , £) £ L 2 (M 2 ) is the Kohn-Nirenberg (KN) symbol of H. The 
spreading function v ) of the operator H is the symplectic Fourier trans¬ 
form of the KN symbol, viz. 

VH(t, v) = JJ 07/(2,£) e - 2 ™("*-£t) fix d£ (6) 

and we have the representation 

Hf(x) = JJm{t,i')TtM I/ f(x)dvdt (7) 


where Ttf(x) = f(x — t ) is the time-shift operator and M. v f(x) = e 27rwx /( 2 ) 
is the frequency-shift operator. 

This is identical to the representation given in |21] where ry//(t, 1 /) = 


A(y, t), see Section 2.4 


To see more clearly where the spreading function arises in the context of 
communication theory, we can define the impulse response of the channel 
modeled by H , denoted hjj(x,t), by 


Hf(x) = J h H {x,t) f(x -t) dt. 
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Note that if hn were independent of x , then H would be a convolution 
operator and hence a model for a time-invariant channel. In fact, with kh(x, t) 
being the kernel of the operator H , 


J K H (x,t) f{t)dt 

( 8 ) 

J h H (x,t)f(x-t)dt 

(9) 

j j rjH(t, v) e 27 ™"^- 4 ) f( x — 7 ) (fo, dt 

( 10 ) 

J a H (x,0me 2ni ^d^, 

( 11 ) 


where 


1ih{ x, t) = kh{x, x — t) 

= J a H (x,0 <****<%, 

= j m {t,v)e 2 ™( x ~ t Uv. ( 12 ) 

With this interpretation, the maximum support of 77 # (t, v) in the first vari¬ 
able corresponds to the maximum spread of a delta impulse sent through 
the channel and the maximum support of 7777 (^, 7 /) in the second variable 
corresponds to the maximum spread of a pure frequency sent through the 
channel. 

Since we are interested in operators whose spreading functions have small 
support, it is natural to define the following operator classes, called operator 
Paley- Wiener spaces (see |4Tj). 

Definition 1. For S C M 2 , we define the operator Paley-Wiener spaces 
OPW(S) by 

OPW(S) = {H £ £(L 2 (R), L 2 (R)): supp^ C S, ||cth||l 2 < 00 }. 

Remark 1. In mm, the spaces OPW p ’ q (S), 1 < p, q < 00 , were considered, 
where L 2 -membership of <jh is replaced 

\Wh\\l™ = {J { J \ a H(x,^)\ q d^j P/9 dx^j /P 

with the usual adjustments made when either p = 00 or q = 00 . OPW p,q (S) 
is a Banach space with respect to the norm ||fF||oPWp> < ! = ||o’h||lp. < !. Note 
that if S is bounded, then OPW°°’°°(S) consists of all bounded operators 
whose spreading function is supported on S. In fact, the operator norm is 
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then equivalent to the OPW a ° ,oc (S) norm, where the constants depend on 

S EE]- 

The general definition is beneficial since it also allows the inclusion of 
convolution operators with kernels whose Fourier transforms lie in L q (WL) 
(OPkF 00 ’ 9 (]R)) and multiplication operators whose multiplier is in L P (M) 
(OPIT p ’°°(R)). 

The goal of operator identification is to find an input signal g such that 
each operator H in a given class is completely and stably determined by 
Hg. In other words, we ask that the operator H > Hg be continuous and 
bounded below on its domain. In our setting, this translates to the existence 
of ci, C 2 > 0 such that 


ci \Wh\\li < \\Hg\\ L i < C 2 ||crir|U 2 , H £ OPW(S). 


(13) 


This definition of identifiability of operators originated in )26j . Note that 
(13) implies that the mapping H K > Hg is injective , that is, that Hg = 0 


implies that H = 0, but is not equivalent to it. The inequality (13) adds to 


injectivity the assertion that H is also stably determined by Hg in the sense 
that a small change in the output Hg would correspond to a small change 
in the operator H. Such stability is also necessary for the existence of an 
algorithm that will reliably recover H from Hg. In this scheme, g is referred 


to as an identifier for the operator class OPW(S) and if (13) holds, we say 
that operator identification is possible. 

In trying to find an explicit expression for an identifier, we use as a starting 
point the “simple measurement scheme” of [21] . in which g is a delta train, 
viz. g = finT for some T > 0. In the framework of operator identification 
the channel measurement criterion in |21| takes the following form | 


Theorem 1. For H £ OPW(fO, T] x [—17/2,17/2]) with Tf2<l, we have 

II H ^ 4t||l=(r) = T\\cth ||l 2 , 
fee z 


and H can he reconstructed by means of 

k h {x + t,x)= X[o,T] (t) SkT ) ( 1 + nT ) S1 1S- n)^ ^ 

z feez ' ' 

where X[o,T](i) = 1 fort £ [0,T] and 0 elsewhere and with convergence in the 
L 2 norm and uniformly in x for every t. 

As was observed earlier, the key feature of this scheme is that the spacing 
of the deltas in the identifier is sufficiently large so as to allow the response 
of the channel to a given delta to “die out” before the next delta is sent. In 
other words, the parameter T must exceed the time-spread of the channel. 
On the other hand, the rate of change of the channel, as measured by its 
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bandwidth 17, must be small enough that its impulse response can be recov¬ 
ered from “samples” of the channel taken T time units apart. In particular, 
the samples of the impulse response T units apart can be easily determined 
from the output. In the general case considered by Bello, in which the spread¬ 
ing support of the operator is not contained in a rectangle of unit area, this 
intuition breaks down. 

Specifically, suppose that we consider the operator class OPW(S) where 
S C [0, T 0 ] x [—17 0 /2, 17o/2] and T 0 17 0 1 but where |< 1. Then sounding 

the channel with a delta train of the form g = 5 n r 0 would severely under¬ 
sample the impulse response function. Simply increasing the sampling rate, 
however, would produce overlap in the responses of the channel to deltas close 
to each other. An approach to the undersampling problem in the literature 
of classical sampling theory is to sample at the low rate transformed versions 
of the function, chosen so that the interference of the several undersampled 
functions can be dealt with. This idea has its most classical expression in the 
Generalized Sampling scheme of Papoulis [M]. Choosing shifts and constant 
multiples of our delta train results in an identifier of the form g = f2 n c n ^t 
where the weights (c n ) have period P (for some P £ N) and T > 0 satsifies 
PT > T 0 . 

If g is discretely supported (for example, a periodically-weighted delta- 
train), then we refer to operator identification as operator sampling. The 
utility of periodically-weighted delta trains for operator identification is a 
cornerstone of operator sampling and has far-reaching implications culmi¬ 
nating in the developments outlined in Sections [5] and [6] 


3.2 Kailath’s necessity proof and operator 
identification 

In Section [275] we presented the proof of the necessity of the condition BL < 1 
for channel identification as given in [2I| . The argument consisted of finding 
a finite-dimensional approximation of the channel H , and then showing that, 
given any putative identifier g , the number of degrees of freedom present in 
the output Hg must be at least as large as the number of degrees of freedom 
in the channel itself. For this to be true in any finite-dimensional setting, we 
must have BL < 1 and so in the limit we require BL < 1. In essence, if 
BL > 1, we have a linear system with fewer equations than unknowns which 
necessarily has a nontrivial nullspace. The generalization of this notion to the 
infinite-dimensional setting is the basis of the necessity proof that appears in 
PBI . In this section, we present an outline of that proof, and show how the 
natural tool for this purpose once again comes from time-frequency analysis. 

To see the idea of the proof, assume that BL > 1 and for simplicity let 
S = [— § > § ] x [— §■ , y]. The goal is to show that for any sounding signal s in 
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an appropriately large space of distributionf0 the operator <P S : OPW (5) — > 
L 2 * (R), H i—^ Hs , is not stable, that is, it does not possess a lower bound in 
the inequality (13). 

First, define the operator E: Iq(Z 2 ) —> OPW(S), where Zo(Z 2 ) is the 
space of finite sequences equipped with the l 2 norm, by 


E(a) — E({(Tk : l}) — ’y^ j ^k,iM-k\/L‘Ti\/B PT-lX/B-^-kX/L 

k,l 

where 1 < A is chosen so that 1 < A 4 < BL and where P is a time-frequency 
localization operator whose spreading function gp(t, v) is infinitely differen¬ 
tiable, supported in S , and identically one on [— x [— ^]. It is easily 

seen that the operator E is well-defined and has spreading function 

VE{*)(t,v) = VP (t,u) 5> M e 2 ™( fc 

k,l 


By construction, it follows that for some constant Ci, ||F , (cr)||opw(S) > 
c i|Mh 2 (z 2 )> for a ll o’, and that for any distribution s, Ps decays rapidly in 
time and in frequency. 

Next define the Gabor analysis operator C g : L 2 (R) —> l 2 ( Z 2 ) by 
C g (s) = {(s,M k \2/ L Ti\2/ B g)}k,iei. 

where g(x) = e~ vx . A well-known theorem in Gabor theory asserts that 
{■Mk a Tipg}k,ie7. is a Gabor frame for L 2 (R) for every a /3 < 1 i -33. (ili. ; 65]). 
Consequently C g satisfies, for some c 2 > 0, ||C s (s)||; 2 ( Z 2 \ > c 2 ||s|| i 2 f R ) for all 
s, since X 2 /L ■ A 2 /B = X A /BL < 1. 

For any s, consider the composition operator 

C g o$ s oE: l 0 (Z 2 )-^l 2 (Z 2 ). 

The crux of the proof lies in showing that this composition operator is not 
stable, that is, it does not have a lower bound. Since C g and E are both 
bounded below, it follows that <P S cannot be stable. Since s £ S' 0 (M.) was 
arbitrary, this completes the proof. 

To complete this final step we examine the canonical bi-infinite matrix 
representation of the above defined composition of operators, that is, the 
matrix M = (rrik\i\k,i) that satisfies 


(C g o@ s o E(o))k\v = ^2 

k,l 


2 Sq(M), the dual space of the Feichtinger algebra 5o(K) da, or tS'fR), the space of 

tempered distributions m ■ These spaces are large enough to contain weighted infinite 

sums of delta distributions. 
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It can be shown that M has the property that for some rapidly decreasing 
function w(x), 


| rnk>,i',k,i\ < u>(max{|Afc' - k\, \\l' - Z|}). (15) 

The proof is completed by the following Lemma. Its proof can be found in 
[26] and generalizations can be found in 1551 - 


Fig. 2: A 1/A—slanted matrix M. The matrix is dominated by entries on a 
slanted diagonal of slope 1/A. 


Lemma 1 . Given M = j 6 z 2 ■ If there exists a monotonically de¬ 
creasing function w: —> Rq with w = 0(x~ 2 ~ s ), 5 > 0, and constants 

A > 1 and K 0 > 0 with \rrijij\ < u>(||Aj' — j||oo) f or ||Aj' — jW^ > K 0 , then 
M is not stable. 

Intuitively, this result asserts that a bi-infinite matrix whose entries decay 
rapidly away from a skew diagonal behaves like a finite matrix with more 
rows than columns (see Figure[2]). Such a matrix will always have a nontrivial 
nullspace. In the case of an infinite matrix what can be shown is that at best 
its inverse will be unbounded. 

We can make a more direct connection from this proof to the original ne¬ 
cessity argument in m in the following way. If we restrict our attention to 
sequences {o-k.i} with a fixed finite support of size say N, then the image of 
this subspace of sequence space under the mapping E is an TV-dimensional 
subspace of OPW(S). The operator P is essentially a time-frequency localiza¬ 
tion operator. This fact is established in |2B] and follows from the rapid decay 
of the Fourier transform of r]p. Since r]p itself is concentrated on a rectangle 
of area BL/ A 2 , its Fourier transform will be concentrated on a rectangle of 
area X 2 /BL. From this it follows that for a as described above, the operator 
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E(a) essentially localizes a function to a region in the time-frequency plane 
of area N(X 2 /BL). 

Considering now the Gabor analysis operator C g , we observe that the 
Gaussian g(x) essentially occupies a time-frequency cell of area 1, and that 
this function is shifted in the time-frequency plane by integer multiples of 
(A 2 /B, X 2 /L). Hence to “cover” a region in the time-frequency plane of area 
N{\ 2 /BL) would require only about 

N(X 2 /BL) _ N 
X A /BL ~ X 2 

time-frequency shifts. So roughly speaking, in order to resolve N degrees of 
freedom in the operator E(<Jk,i ), we have only N/X 2 < N degrees of freedom 
in the output of the operator E{(Jk y i)s. 


3.3 Identification of operator Paley- Wiener spaces by 
periodically weighted delta-trains 

Theorem [l] is based on arguments outlined in Section pk4| and applies only to 
OPW(S) if S is contained in a rectangle of area less than or equal to one. 
In the following, we will develop the tools that allow us to identify OPW(S) 
for any compact set S of Lebesgue measure less than one. 

In our approach we discretize the channel by covering the spreading sup¬ 
port S with small rectangles of fixed sidelength, which we refer to as a recti¬ 
fication of S. As long as the measure of S is less than one, it is possible to do 
this in such a way that the total area of the rectangles is also less than one. 
This idea seems to bear some similarity to Bello’s philosophy of sampling 
the spreading function on a fixed grid but with one fundamental difference. 
Bello’s approach is based on replacing t and x by samples, thereby approxi¬ 
mating the channel. For a better approximation, sampling on a finer grid is 
necessary, which results in a larger system of equations that must be solved. 
In our approach, as soon as the total area of the rectification is less than one, 
the operator modeling the channel is completely determined by the discrete 
model. Once this is achieved, identification of the channel reduces to solving 
a single linear system of equations at each point. 

Given parameters T > 0 and P £ N, we assume that S is rectified by 
rectangles of size T x G, where fl = 1 /(TP), such that the total area of 
the rectangles is less than one. Given a period-P sequence c = (c n ) n6 z, we 
then define the periodically weighted delta-train g by g = ^„ gZ c n S n T- The 
goal of this subsection is to describe the scheme by which a linear system of 
P equations in a priori P 2 unknowns can be derived by which an operator 
H € OPW(S) can be completely determined by Hg{x). In this sense, the 
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Fig. 3: A set not satisfying Kailath’s condition is rectified with 1 /(Tf2) = 
P £ N, the rectification has area < 1, l? max < 1/T, and T max < 1/1?. 


“degrees of freedom” in the operator class OPW{S ), and that of the output 
function Hg(x ) are precisely defined and can be effectively compared. 

The basic tool of time-frequency analysis that makes this possible is the 
Zak transform (see HU). 

Definition 2. The non-normalized Zak Transform is defined for / £ <S (IRpJ 
and a > 0 by 

Z a f(t,u) = Y / f( t-an)e 2 ~. 

n£Z 

Z a f(t , v) satisfies the quasi-periodicity relations 

Z a f(t + a,v) = e 2ma " Z a f(t,v) 


and 

v + 1/a) = Z a f(t, v). 


yja Z a can be extended to a unitary operator from L 2 (M) onto L 2 ([ 0, a] x [0,1/a]). 

A somewhat involved but elementary calculation yields the following (see 
I.50J and Section |7.1|). 


Lemma 2. Let T > 0, P £ N, c = (c n ), and g be given as above. Then for 
all ( t , v) £ K 2 , and p = 0, 1, ..., P—1, 


e - 2 ™ Tp (Z TpoH )g(t + T p,v) 

P-1 

= n {T q M m c) p e- 2mvTq i 1 %{t + Tq,p + m/TP). (16) 

q, m =0 

Here T and A4 are the translation and modulation operators given in Defi¬ 
nition [ij and r)®(t,p) is the quasiperiodization of gn, 


3 S(R) denotes the Schwartz class of infinitely-differentiable, rapidly-decreasing functions. 
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i lH {t + kTP,v + £/T)e- 2 ™ kTP (17) 

k l 

whenever the sum is defined. 



Fig. 4: Channel sounding of OPW({ 0,2/3] x [—1/4,1/4] U 
[4/3,2] x [—1/2,1/2]) using a P-periodically weighted delta train g. The 
kernel n{x,y) takes values on the (x, y)-plane, the sounding signal g, a 
weighted impulse train, is defined on the y- axis, and the output signal 
Hg(x ) = / n(x,y)g(y)dy is displayed on the cc-axis. Here, the sample 
values of the tab functions h(x,t) = n(x,t — x) are not easily read of 
the response Hg(x) as, for example for x € [2T, 3T] = [4/3,2] we have 
Hg{x) = 0.7k(:e, 0) + 0.6«(a", 2 T) = 0.7 h(x, x) + . 6 h(x, 2 T — x). In detail, we 
have g — ... -f* 0 . 7 d _2 T 0 . 5 (^_ 4/3 0 . 6 d_ 2/3 -t- O.TSq T 0 - 5 c^ 2/3 T 0 . 6 ( 54/3 T 

0.7d 2 + 0.5<5 8/3 + ..., so P = 3, T = 2/3, D = 1/PT = 1/2, c n = 0.7 if n 
mod 3 = 0, c n = 0.5 if n mod 3 = 1, c n = 0.6 if n mod 3 = 2.. 


Under the additional simplifying assumption that the spreading function 
v) is supported in the large rectangle [0,TP] x [0,1/T], and by restrict¬ 
ing (16) to the rectangle [0,T] x [0,1/(TP)], we arrive at the PxP 2 linear 
system 


p -1 




(18) 


q,m—0 


where 


Z Hg (t, p) p = ( Z TP o H)g(t+pT, v ) e ~ 2 ™ pT 


(19) 
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Vh (t, v) ( q ,m) =n m (t + qT,v + m/TP) e~ 2 ^ T e ~ 2 ”^ p , 


( 20 ) 


and where G(c) is a finite Gabor system matrix (231. If (18) can be solved for 
each (t, u) £ [0, T\ x [0, 1/(TP)\, then the spreading function for an operator 
H can be completely determined by its response to the periodically-weighted 
delta-train g. 

As anticipated by Bello, two issues now become relevant. (1) We re¬ 
quire that supp r/H occupy no more than P of the shifted rectangles [0,T] x 
[0,1 /{TP)} + ( qT,k/(TP )), so that |l 8 | has at least as many equations as 
unknowns. This forces | supp 77 # | < 1. (2) We require that c be chosen in 
such a way that the P x P system formed by removing the columns of G(c) 
corresponding to vanishing components of ri H is invertible. That such c exist 
is a fundamental cornerstone of operator sampling and is the subject of the 
next section. 

Based on the existence of c such that any set of P columns of G(c) form 
a linearly independent set, we can prove the following |49j . 


Theorem 2. For S C (0, oo)xR compact with |Sj < 1, there exists T > 0 
and P £ N , and a period-P sequence c = (c n ) such that g = 'f2 n c n 5 n T 
identifies OPW(S). In particular, there exist period-P sequences bj = ( bj t k), 
and integers 0 < qj, mj < P—1, for 0 < j < P —1 such that 


p -1 

M M) = e-/T EE [bj,k Hg(t - {qj - k)T ) 

k j =0 

e 2 *i mj (x-t)/PT _ t) + ( q . _ r ( t _ q . T ^ (21) 

where r,cf> £ S (K) satisfy 

Y j r{t + kT) = \ = Y J ^{l + n/PT), ( 22 ) 


where r{t)(f{ 7 ) is supported in a neighborhood of [0, T] x [0,1/PT], and where 
the sum in ( 21 ) converges unconditionally in L 2 and for each t uniformly in 


Equation (21) is a generalization of (14) which is easily seen by choosing 


4>{x) = sin( 7 r PTx) /( ttPTx ) and r(t) to be the characteristic function of [0, T). 


4 Linear Independence Properties of Gabor Frames 
4-1 Finite Gabor Frames 

Definition 3. Given P £ N, let u> = ef^ l l p and define the translation oper¬ 
ator T on {xq, ..., Xp- 1 ) £ C p by 
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Tx = (x P -i,Xo, XI, . . .,Xp- 2 ), 
and the modulation operator A4 on C p by 

M.x = ( U}°Xq , oAxi, ..., UI P ~ 1 Xp-i). 

Given a vector c £ C p the finite Gabor system with window c is the collection 
{T q Ai p c} p ~l 0 . Define the full Gabor system matrix G(c) to be the P x P 2 
matrix 

G(c) = [D 0 Wp\D 1 W P \ ••• | D P _ i W P } (23) 

where Dk is the diagonal matrix with diagonal 

T k C = ( Cp-k , ■ • - , Cp- 1, C 0 , . . ■ , Cp-k- 1), 

and Wp is the P x P Fourier matrix W P = {e 2 ' Kmm ^ p ) p ff =0 - 

Remark 2. (1) For 0 < q, p < P — 1, the (q + l)st column of the submatrix 
D p Wp is the vector A 4 p T q c where the operators A 4 and T are as in Defi¬ 
nition [3] This means that each column of the matrix G(c) is a unimodular 
constant multiple of an element of the finite Gabor system with window c, 
namely {e~ 2nipq / p T q M p c} p ~l 0 . 

(2) Note that the finite Gabor system defined above consists of P 2 vectors in 
C p which form an overcomplete tight frame for C p [52]. For details on Gabor 
frames in finite dimensions, see [321. 2210] and the overview article [54 1 . 

(3) Note that we are abusing notation slightly by identifying a vector c £ C p 
with an P-periodic sequence c = (c n ) in the obvious way. 

Definition 4. 0 The Spark of an M x N matrix F is the size of the smallest 
linearly dependent subset of columns, i.e., 

Spark(F) = min{||x||o : Fx = 0, s/0} 

where ||x||o is the number of nonzero components of the vector x. If Spark(F) = 
M + 1, then F is said to have full Spark. Spark(F) = k implies that any col¬ 
lection of fewer than k columns of F is linearly independent. 


4-2 Finite Gabor frames are generically full Spark 

The existence of Gabor matrices with full Spark has been addressed in i32{ 
[34] . The results in these two papers are as follows. 

Theorem 3. If P £ N is prime then there exists a dense, open subset of 
c £ C p such that every minor of the Gabor system matrix G(c) is nonzero. 
In particular, for such c, G(c) has full Spark. 
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Theorem 4. \34V For every P £ N, there exists a dense, open subset of 
c £ C p such that the Gabor system matrix G{c) has full Spark. 


The goal of this subsection is to outline the proof of Theorems [3] and |4j 
We will adopt some of the following notation and terminology of [Mj. 

Let P £ N and let M be an P x P submatrix of G(c). For 0 < k < P 
let £ K be the number of columns of M chosen from the submatrix D K Wp 
of (23). While the vector i = (£h)^Zq does not determine M uniquely, it 
describes the matrix M sufficiently well for our purposes. Define M K to be the 
Pxl, matrix consisting of those columns of M chosen from D K Wp■ Given 
the ordered partition B = ( B 0 , B±, ..., Bp_ i) where { B 0 , B\, ..., Bp_ i} 
forms a partition of {0, ..., P — 1}, and where for each 0 < n < P, \B K \ = 
£ K , let M k (B k ) be the £ K x £ K submatrix of M K whose rows belong to B K . 
Then det(M) = det(M K (B K )) where the product is taken over all such 
ordered partitions B. This formula is called the Lagrange expansion of the 
determinant. 

Each ordered partition B corresponds to a permutation on Zp as follows. 
Define the trivial partition A = (Aq, A±, ..., Ap_i) by 


j — 1 j— 1 j 

4 ( - = {£M£*0 + 1 >---> (E^)- 1 } 

2—0 2 — 0 2—0 

so that Aq = [0, £q — 1], A± = [£o,£o + £\ + 1]> • • ■, Ap -1 = \£$ 4- • • • + 
£p- 2 , P — 1]. Then given B = {B$, B\, ..., Bp- 1 ) there is a permutation 
a £ Sp such that <j(A k ) = B K for all n. This <r is unique up to permutations 
that preserve A, that is, up to r £ Sp such that t( j 4 k ) = A K for all k. Call 
such a permutation trivial and denote by T the subgroup of Sp consisting of 
all trivial permutations. Then the ordered partitions B of Zp can be indexed 
by equivalence classes of permutations cr £ Sp/T. 

The key observation is that det(M) is a homogeneous polynomial in the 
P variables cq, ci, . .., cp_i and we can write 


det(M) = a ° C<J ( 24 ) 

aeSp/r 


where the monomial C a is given by 

p-1 

G = p). 

K=0 je<r(A K ) 

If it can be shown that this polynomial does not vanish identically then we 
can choose a dense, open subset of c £ C p for which det(M) 7 ^ 0. Since there 
are only finitely many P x P submatrices of G(c) it follows that there is a 
dense, open subset of c for which det(M) 7 ^ 0 for all M, and we conclude 
that, for these c, G(c) has full Spark. 
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Following [34] , we say that a monomial C a ° appears uniquely in (24) if for 
every er e Sp/T such that a cr 0 , C a ^ C a °. Therefore, in order to show 
that the polynomial (24) does not vanish identically, it is sufficient to show 
that (1) there is a monomial C a that appears uniquely in (24) and (2) the 
coefficient a a of this monomial does not vanish. 

Obviously, whether or not (24) vanishes identically does not depend on how 
the variables Cj are labelled. More specifically, if the variables are renamed 
by a cyclical shift of the indices, viz., Cj i —> C(i+ 7 )mod p f° r some 0 < 7 < P, 
then 


det(M)(c 7 + i, ..., c P _ 1 , c 0 , ..., c 7 ) = ± det (M')(c 0 , ..., c P _i) 
where M' is an P x P submatrix described by the vector 
(! = (&y+l, • • • , ip- 1 , £ 0 , • • • , t'i)- 


4.2.1 The lowest index monomial 


In [32] , a monomial referred to in 134] as the lowest index (LI) monomial is 
defined that has the required properties when P is prime. In order to see this, 
note first that each coefficient a a in the sum (24) is the product of minors 


of the Fourier matrix Wp and since P is prime, Chebotarev’s Theorem says 
that such minors do not vanish jBSj. More specifically, 


p-i 


a a C° = ± det ( M «( CT (^))) 


K .—0 

and for each k, the columns of M K are columns of Wp where each row has 
been multiplied by the same variable Cj and M K {a{A K )) is a square matrix 
formed by choosing l K rows of M K . Hence for each k, det (M K (a(A K ))) is a 
monomial in c with coefficients a constant multiple of a minor of Wp. Since 
a a is the product of those minors, it does not vanish. 

Note moreover that each submatrix M K (a(A Kj )) is an i K xi K matrix, so that 
det (M k (<j(A k ))) is the sum of a multiple of the product of l K \ diagonals of 
M K (a(A K )). Hence a a C a is the sum of multiples of the product of TIk^o 1 
generalized diagonals of M. 

We define the LI monomial as in [32] as follows. If M is 1 x 1, then det(M) 
is a multiple of a single variable Cj and we define the LI monomial, pm by 
Pm = Cj. If M is d x d , let Cj be the variable of lowest index appearing in M. 
Choose any entry of M in which Cj appears, eliminate the row and column 
containing that entry, and call the remaining (d — 1) x (d — 1) matrix M'. 
Define pm = Cj Pm' ■ It is easy to see that the monomial pm is independent of 
the entry of M chosen at each step. In order to show that the LI monomial 
appears uniquely in (24), we observe as in [32] that the number of diagonals 
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in det(M) that correspond to the LI monomial is Since this is 

also the number of generalized diagonals appearing in the calculation of each 
det(M K (cr(A K ))), it follows that this monomial appears only once. The details 
of the argument can be found in Section |7.2| Note that because P is prime, 
this argument is valid no matter how large the matrix M is. In other words, 
M does not have to be an P x P submatrix in order for the result to hold. 
Consequently, given k < P and M an arbitrary P x k submatrix of G(c), we 
can form the k x k matrix M 0 by choosing k rows of M in such a way that the 
LI monomial of Mq contains at most only the variables Co, ..., Ck-i■ This 
observation leads to the following theorem for matrices with arbitrary Spark. 

Theorem 5. 15(f) If P £ N is prime, and 0 < k < P, there exists an open, 
dense subset of c £ C k x {0} C C p with the property that Spark(G(c)) = k+ 1. 

This result has implications for relating the capacity of a time-variant 
communication channel to the area of the spreading support, see |50l . 


4.2.2 The consecutive index monomial 


In [34], a monomial referred to as the consecutive index (Cl) monomial is 
defined that has the required properties for any P £ N. The Cl mono¬ 
mial, C 1 , is defined as the monomial corresponding to the identity per¬ 
mutation in Sp/r, that is, to the equivalence class of the trivial partition 
A = (A 0 , Ai, ..., Ap_i). Hence 

p- i 

C 1 = n n C U-A mod P’ 

k= 0 j£A K 


For each k, the monomial appearing in det (M K (A K )), YljeA c (j-K.)mod p, 
consists of a product of Ik variables Cj with consecutive indices modulo P. 

That ai / 0 follows from the observation that for each K, det (M K (A K )) 
is a monomial whose coefficient is a nonzero multiple of a Vandermonde 
determinant and hence does not vanish (for details, see [31]). The proof that 
C 1 appears uniquely in (24) amounts to showing that, with respect to an 


appropriate cyclical renaming of the variables Ci, the Cl monomial uniquely 
minimizes the quantity H(G CT ) = Y^f=o 1 i 2 a ii where ai is the exponent of 
the variable Ci in C a . An abbreviated version of the proof of this result as it 
appears in [31] is given in Section 7.3 


As a final observation, we quote the following corollary that provides an 
explicit construction of a unimodular vector c such that G(c ) has full Spark. 

Corollary 1. f 34/ Let f = e 2 ' Kl /( p ^ 1 'i or any other primitive root of unity of 
order (P — l) 4 where P > 4. Then the vector 


0= (l, c, < 4 , C'"- 1 ’’) 
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generates a Gabor frame for which G(c) has full Spark. 


5 Generalizations of operator sampling to higher 
dimensions 

The operator representations and 0 hold verbatim for higher dimen¬ 

sional variables x,£,t,v £ In this section, we address the identifiability 
of 


OPW(S) = {H £ C(L 2 (R d ),L 2 (R d )): supp J s a H C S, \\<t h \\l* < oo} 
where S C1 M 

Looking at the components of the multidimensional variables separately, 
Theorem [l] easily generalizes as follows. 

Theorem 6. For H £ OPW(II/ = i[0,T/]x Iltib^/ 2 . ^/ 2 ]) withT e f2 e <l, 
t = 1 ,..., d, we have 

\\ H E ’' ’ E S (k 1 T 1 ,...,k d T d )\\L2(«) = T i ■ ■ ■'GIItwIIl 2 , 

fciez k d e z 

and H can be reconstructed by means of 

n H (x + t,x) = xn^io, Te ](t) E E 

niGZ 

(*E-Ew ■ ,k d T d )) (t + {niTi, .. -, n d T d ) 

fciez k d e z 

sin(7rri(a;i - ni)) sin(7rT d (a; d - n d )) 

TtT\(x\ — n\) nT d (x d -n d ) 

with convergence in the L 2 norm. 

In the following, we address the situation where S is not contained in a set 
n^=i[0^ x rig=i[—T?^/2]) with T(i ?^<1, l = 1 ,...,d. For example, 

S = [0,1] x [0, 2] x [0, |] x [0,1] C K 4 of volume \ is not covered by Theorem^ 
To give a higher dimensional variant of Theorem [2] we shall denote point- 
wise products of finite and infinite length vectors k and T by k*T, that is, 
k-kT = (fciTi,..., k d T d ) for k,T £ C d . Similarly, k/T = (fci/Ti,..., k d /T d ). 

Theorem 7. If S C (0, oo) d xK d is compact with |5| < 1 then OPW(S) 
is identifiable. Specifically, there exist Ti,... ,T d > 0 and pairwise relatively 
prime natural numbers P \,..., P d such that 

d d 

SC HiO, P e T e }x Y[\-l/(2T e ),l/(2T e )], 
i=\ e =1 




30 


David Walnut, Gotz E. Pfander, Thomas Kailath 


and a sequence c = ( c n ) £ £°°(X d ) which is Pg periodic in the l-th component 
ne such that g = c « ^n*T identifies OPW 2 (S). In fact, for such g there 

exists for each j £ J = ,Pe—1 } a sequences bj = (bj t k) which 

is Pi periodic in k( and 2d-tuples ( qj,mj ) £ J x J with 

h(x, t) = ^ Y [ b J,k Hg(t - (,qj - k)*T ) 

fee z d j£J 

e 2 nim r ({x-t)/P*T) ^ _q + ^ _ fc )* T ) r ( t _ % .*T)] . (25) 

The functions r,<j> £ <S(R d ) are assumed to satisfy 

Y r(t + k*T) = 1=Y^T+ (n/P*T), (26) 

fee z d nez d 


and r(f)<f>( 7 ) zs supported in a neighborhood o/J|^ = 1 [ 0 ,T^]x 1 /P^T^]. 

77ze sttm to (25) converges unconditionally in L 2 and for each t uniformly in 


This result follows from adjusting the proof of Theorem [7] to the higher 
dimensional setting. For example, it will employ the Zak transform 

Z T *pf(t, v)=Y /(* - n * P * T ) e 2 ’ ril, ' (p * T) , 

nez d 


where P = (Pi ,..., Pd). We are then led again to a system of linear equations 
of the form 


Z Hg (t, v)p — EE ^( c )p,(q,m) V V) (q,m) (27) 

gG J m£j 


where as before 


z Hg(t, v)p = ( Z T *p o H)g(t + p*T,!/) e ~ 2 ^* T , 


Tlntt, v )(q,m) =(T\Pi ■ ■ • T d p d ) g H (t + g*T, z/ + ( m/T-kP) 


g—27 riu-(q-kT) 2niq-(m/P) 


and where G(c) is now a multidimensional finite Gabor system matrix similar 


to (23). 


In order to show that the spreading function for operator H can be com¬ 
pletely determined by its response to the periodically-weighted d-dimensional 
delta-train g, we need to show that (27) can be solved for each (t,u) £ 
IILiM]x nEfOi 1 /(TtPg)\ if c £ C p i x — xp d i s chosen appropriately. 

To see that a choice of c is possible, observe that the product group Zp, x 
... x Zp d is isomorphic to the cyclic group 1 p 1 .....p d since the P( are chosen 
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pairwise relatively prime. Theorem [4] applied to the cyclic group hp 1 .....p d 
guarantees the existence of c £ C Pl '"' ,P|i so that the Gabor system matrix 
G(c) is full spark. We can now define c £ c-Pix.-.xPd by setting 

Cm,...,nd = C-n 1 +n 2 Pi+n 3 P 1 P 2 + ...+nd Pi...Pd-i i n ~ 1 ) • • ■ > Pd) £ J 

and observe that G(c) is simply a rearrangement of G(c), hence, G(c) is full 
spark. 


6 Further results on operator sampling 

The results discussed in this paper are discussed in detail in mmmm 
HU and [50]. The last listed article contains the most extensive collection 
of operator reconstruction formulas, including extensions to some OPW(S) 
with S unbounded. Moreover, some hints on how to use parallelograms to 
rectify a set S for operator sampling efficiently are given. 

A central result in [50] is the classification of all spaces OPW(S) that are 
identifiable for a given g = c n^nT for c n being P-periodic. 

The papers [35] [3T] address some functional analytic challenges in operator 
sampling, and [28] focuses on the question of operator identification if we are 
restricted to using more realizable identifiers, for example, truncated and 
modified versions of g 1 namely, g(t) = Yln=o Cnf{^ ~ nT). The problem of 
recovering parametric classes of operators in OPW(S) is discussed in [2,3]. 

In the following, we briefly review literature that address some other di¬ 
rections in operator sampling. 


6.1 Multiple Input Multiple Output 


A Multiple Input Multiput Output (MIMO) channel H with N transmitters 
and M receivers can be modeled by an A x M matrix whose entries are 
time-varying channel operators H mn £ OPW(S mn ). For simplicity, we write 
H £ OPIF(S). Assuming that the operators H mn are independent, a suffi¬ 
cient criterion for identiffability is given by Yln= l l^mn| < 1 for m = 1,..., M. 
Conversely, if for a single m, J2n=i l^mnl > 1, then OPW(S) is not identifi¬ 
able by any collection si,..., Sn of input signals mm- 

A somewhat dual setup was considered in [20]. Namely, a Single Input Sin¬ 
gle Output (SISO) channel with S being large, say S = [0 ,M] x [— N/2,N/2] 
with N, M > 2. As illustrated above, OPW([0,M] x [— N/2, N/2]) is not 
identifiable, but if we are allowed to use MN (infinite duration) input signals 
<?i,..., gMN: then H £ OPW([ 0, M] x [—iV/2, -N/2]) can be recovered from 
the MN outputs Hgi,... ,HgMN■ 
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6.2 Irregular Sampling of Operators 

The identifier g = c n^nT is supported on the lattice TZ in R. In general, 

for stable operator identification, choosing a discretely supported identifier 
is reasonable, indeed, in [25] it is shown that identification for OPW(S) in 
full requires the use of identifiers that neither decay in time nor in frequency. 
(Recovery of the characteristics of H during a fixed transmission band and 
a fixed transmission interval can be indeed recovered when using Schwartz 
class identifiers [28].) 

In irregular operator sampling, we consider identifiers of the form g = 
E«ez c rA„ for nodes A„ that are not necessarily contained in a lattice. If 
such g identifies OPW(S), then we refer to suppg = {A„} as a sampling set 
for OPW(S), and similarly, the sampling rate of g is defined to be 



where 


n (r) = inf #{At fl [a;, x + r]} 


assuming that the limit exists mm- 

To illustrate a striking difference between irregular sampling of functions 
and operators, note that Z is a sampling set for OPW([0, 1] x [—|, |]) as 
well as for the Paley Wiener space PW([— |, |]), but the distribution g = 
cqS\ 0 + Enez\{ 0 } c n^n does not identify OPW([0, 1] x [-|,|]), regardless 
of our choice of c„ and Ao f 0. This shows that, for example, Kadec’s jth 
theorem does not generalize to the operator setting Ba¬ 
in m we gi ye with D{g) = D(A) > B(S) a necessary condition on the 
(operator) sampling rate based on the bandwidth B(S) of OPW(S) which is 
defined as 



(28) 


Here, xs denotes the characteristic function of S. This quantity can be inter¬ 
preted as the maximum vertical extent of S and takes into account gaps in 
S. Moreover, in Ea we discuss the goal of constructing {A„} of small density, 
and/or large gaps in order to reserve time-slots for information transmission. 
Results in this direction can be interpreted as giving bounds on the capacity 
of a time-variant channel in OPW(S) in terms of |S| [50] , 

Finally, we give in [5D] an example of an operator class OPW(S) that 
cannot be identified by any identifier of the form g = T n ez c n^nT with 
T > 0 and periodic c n , but requires coefficients that form a bounded but 
non-periodic sequence. In this case, S is a parallelogram and B(S) = D(g ) 
(see Figure [5]) 







Cornerstones of Sampling of Operator Theory 


33 


v 



Fig. 5: The the operator class OPW 2 (S) with S = (2, 2 ; y/2, -\/2+l/2)[0, l] 2 
whose area equals 1 and bandwidth equals 1/2 is identifiable by a (non- 
periodically) weighted delta train with sampling density 1/2. It is not iden¬ 
tifiable using a periodically-weighted delta train. 


6.3 Sampling of OPW(S) with unknown S. 


In some applications, it is justified to assume that the set S has small area, but 
its shape and location are unknown. If further S satisfies some basic geometric 
assumptions that guarantee that S is contained in [0, TP] x [—1/2T, 1/2 T\ and 
only meets few rectangles of the rectification grid [kT, (k + 1)T] x [ q/TP , (q + 
1)/TP], then recovery of S and, hence, an operator in OPW(S) is possible 

mm- 

The independently obtained results in ESI US] employ the same identifiers 
g = as introduced above. Operator identification is therefore 


again reduced to solving (18), that is, the system of P linear equations 


Z(f,z/) = G(c)g(t,v) 


(29) 


for the vector g(t,v) € C p ~ for (t, i/) € [0,T] x [—1/2TP, 1/2TP]. While the 
zero components of r/(t,u) are not known, the vector is known to be very 
sparse. Hence, for fixed (t, i/), we can use the fact that G(c) is full spark and 
recover rj(t, u) if it has at most P/2 nonzero entries. Indeed, assume g(t, u) 


and rj(t, v) solve (29) and both have at most P/2 nonzero entries. Then 
g(t, v) — rj(t , v ) has at most P nonzero entries and the fact that G(c) is full 
spark indicates that G(c)(g(t, u) — rj(t , v)) = 0 implies g(t, v) — rj(t, v) = 0. 

Clearly, under the geometric assumptions alluded to above, the criterion 
that at most P/2 rectangles in the grid are met can be translated to the 
unknown area of S has measure less than or equal to 1/2. 
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Fig. 6: For S the union of the colored sets, OPW{S) is identifiable even 
though 7 > 3 boxes are active, implying that S cannot be rectified with 
P = 3 and T = 1 is not possible (see Section [373] ) . Recovering g from Hg 
requires solving three systems of linear equations, one to recover g on the 
yellow support set, one to recover g on the red support set, and one to recover 
g on the blue support set. The reconstruction formula (21) does not apply 
for this set S. 


In [TB], this area 1/2 criterion is improved by showing that H can be 
identified whenever at most P — 1 rectangles in the rectification grid are met 
by S. This result is achieved by using a joint sparsity argument, based on the 
assumption that for all ( t , v), the same cells are active. 

Alternatively, the “area 1/2” result can be strengthened by not assuming 
that for all (i, 1 /), the same cells are active. This requires solving ( |29| , for 
r)(t,v) sparse, for each considered (t,v) independently, see Figure^and [50] . 

It must be added though, that solving (29) for r](t, v) being P/2 sparse is 
not possible for moderately sized P, for example for P > 15. If we further 
reduce the number of active boxes, then compressive sensing algorithms such 
as Basis Pursuit and Orthogonal Matching Pursuit become available, as is 
discussed in the following section. 


6.4 Finite dimensional operator identification and 
compressive sensing 


Operator sampling in in the finite dimensional setting translates into the 
following matrix probing problem [351 71 [Bi. For a class of matrices Ai £ 
C PxP , find c £ C p so that we can recover M £ Ai from Me. 
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Fig. 7: The matrix probing problem: find c so that the map M. —> C p , 
M K > Me is injective and therefore invertible. 


The classes of operator considered here are of the form M v = ^xVxBx 
with B\ = Bp q = T p M q , and the matrix identification problem is reduced 
to solving 

p -1 

Z = M„c = £ 7j p q ( T p M q c ) = G(c)ij, (30) 

p,q=0 


where c is chosen appropriately; this is just (29) with the dependence on (f, v) 
removed. 

If r) is assumed to be /c-sparse, we arrive at the classical compressive sens¬ 
ing problem with measurement matrix G(c) £ C PxP which depends on 
c = (co,ci,..., cp_i). To achieve recovery guarantees for Basis Pursuit and 
Orthogonal Matching Pursuit, averaging arguments have to be used that 
yield results on the expected qualities of G(c). This problem was discussed 
in [45j HU 06] as well as, in slightly different terms, in mm]. The strongest 
results were achieved in m by estimating Restricted Isometry Constants for 
c being a Steinhaus sequence. These results show that with high probability, 
G(c) has the property that Basis Pursuit recovers r) from G(c)j 7 for every k 
sparse rj as long as k < C P/ log 2 P. for some universal constant G. 



Fig. 8: Time-frequency structured measurement matrix G(c) with c randomly 
chosen. 


6.5 Stochastic operators and channel estimation 

It is common that models of wireless channels and radar environments take 
the stochastic nature of the medium into account. In such models, the spread- 
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ing function rj(t, v) (and therefore the operator’s kernel and Kohn-Nirenberg 
symbol) are random processes, and the operator is split into the sum of its 
deterministic portion, representing the mean behavior of the channel, and its 
zero-mean stochastic portion that represents the noise and the environment. 

(t'X) (t'X) (t'X) 




Fig. 9: Support sets of autocorrelation functions, the general case, the WSSUS 
case, and the tensor case. 


The detailed analysis of the stochastic case was carried out in EH El]. 
One of the foci of these works lies in the goal of determining the second-order 
statistics of the (zero mean) stochastic process g(r,v), that is, its so called 
covariance function R(r, u, r', v') = E{? 7 (t, v) ^(t 7 , z/)}. In [521 [FT], it was 
shown that a necessary but not sufficient condition for the identifiability of 
v, t', u') from the output covariance A(t , t') = E {Hg{t) Hg(t')} is that 
R(t,v,t' ,v') is supported on a bounded set of 4-dimensional volume less 
than or equal to one. Unfortunately, for some sets S C R 4 of arbitrary small 
measure, the respective stochastic operator Paley-Wiener space StOPW(S) 
of operators with Rrj supported on S is not identifiable; this is a striking 
difference to the deterministic setup where the geometry of S does not play 
a role at all. 

In {37j S3, the special case of wide-sense stationary operators with uncor¬ 
related scattering, or WSSUS operators is considered. These operators are 
characterized by the property that 

Rrj(t, v, t !, u') = C v {t, u) S(t — t') 6(is — v'). 

The function C v (t, v) is then called scattering function of H. Our results on 
the identifiability of stochastic operator classes allowed for the construction 
of two estimators for scattering functions 133 Eli. The estimator given in [53] 
is applicable, whenever the scattering function of H has bounded support. 
Note that the autocorrelation of a WSSUS operator is supported on a two 
dimensional plane in R 4 which therefore has 4D volume 0, a fact that allows 
us to lift commonly assumed restrictions on the size of the 2D area of the 
support of the scattering function. 
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For details, formal definitions of identifiability and detailed statements of 
results we refer to the papers mmmm- 


7 Appendix: Proofs of Theorems. 

7.1 Proof of Lemma [|] 

In order to see how the time-frequency shifts of c arise, we will briefly outline 
the calculation that leads to ©• It can be seen by direct calculation using the 
representation given by 0, that if g = J2n S nTP then (Hg, s) = ( g H , Z TP s) 
for all s £ S (R) where the bracket on the left is the L 2 inner product on R and 
that on the right the L 2 inner product on the rectangle [0, TP]x[0,1/(TP)]. 
Periodizing the integral on the left gives 

/•l /(TP) rTP 

( riH,Z TP s)= / EE 7 ^ + kTP,v + m/(TP)) 

J o k m 

e " 2 ™ kTP Z TP s(t,u) dt dr. 

Since this holds for every s £ <S(R), we conclude that 
(Z TP o H)g(t, v) 

= i/TP) EE T]H{t + kTP, v + m/(TP)) e 2 ' KlvkTP . 

k m 

Given g = X)npz c TnTj for a period-P sequence c = (c n ), and letting 
n = mP — q for m £ Z and 0 < q < P, we obtain 


p -1 

g = E C " Tt = E E CmP ~Q SmPT-qT 

q—0 mez 

= E c -fT-qT ( E SmPT ) • 

q—0 ' ra£Z 

Since for aeR, the spreading function of H o T a is Vn{t — a, v) e 2mi,a , we 
arrive at 

(Z TP o H)g(t, v) 

p -1 

= i/( tp) E c - ? EE t]h (f + kTP + qT,u + m/(TP)) 

q—0 k m 

(TP))qT 27 xiukTP 


( 31 ) 
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Letting m = jP +1 for j £ Z and 0 < t < P, we obtain 

(■ Z TP o H)g(t , v) 

p- 1 p-i 

= 1 /(TP) E c -9 E E E W* + kTP + qT,u + j/T + £/(TP)) 

g=0 /c j 0 

^—2'KivqT ^—2-Kitq/P ^—2'KivkT P 

P-1 P-1 

= 1 /(TP) E E {c- q e~ 2 * Uq/P ) e~ 2 ™ qT ^(t + Tq,v + e/TP). 

q= 0 <=0 

Finally, replacing t by t+pT for p = 0, 1, ..., -P—1, and changing indices 
by replacing q by q — p, we obtain 

{Z TP o H)g(t + pT, v) 

p- lp-i 

= 1/(TP) E E ( c -9 e~ 2Mq/p ) e~ 2mvqT + (q+ p)T, v + i/TP) 

g=0 1=0 

= 1/ (TP) E E (c-(5- p) e- 2 ^^/ P ) 

5=0 e=o 

e -2niv(q-p)T ?? Q (j + ? T, P + £/TP). 

The observation that (T q M m c) p = c p _, e 2 ” m (p-9)/-P completes the proof. 


7.2 Proof of Theorem [ 5 ] 

To see why this is true, define g{M) to be the number of diagonals of M 
whose product is a multiple of Pm . and proceed by induction on the size of 
the matrix M. If M is 1 x 1 then the result is obvious. Suppose that M is 
n x n and that it is described by the vector £ = (£ 0 , ..., lp-\). Assuming 
without loss of generality that the variable of smallest index in pM with a 
nonzero exponent is Cq, there is a row of M in which the variable Co appears 
£j times for some index j. Choose one of these terms and delete the row 
and column in which it appears. Call the remaining matrix M' . The vector £ 
describing M' is (£o, ..., £j~\, £j — 1, £j+i, ■ ■ ■, £p~ i), and is independent of 
which term was chosen from the given row to form M' . By the construction 
of the LI monomial, pm = cq pm' and by the induction hypothesis 

p(M') = £ 0 \ ••• ^_i!(^-l)!£, + i! f P _i!. 

Since there are £j ways to choose a term from the given row to produce M' 
we have that 
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P-1 


KM) = tj KM') = 4! • • • ij -1 \lj(h ~ 1)! W • • • Ip- 1 ! = n £ - ! 


K—0 


in (24) is made up of a sum of precisely this many 


which was to be proved. 

Since each term a a C a 
terms, it follows that exactly one of these terms is a multiple of the LI mono¬ 
mial. Alternatively, we can think of the LI monomial as the one correspond¬ 
ing to the a e Sp/r that minimizes the functional A 0 [C a ) = J2f=o * 2 H(&i) 
where cn is the exponent of Ci in C a and where H(ai) = 0 if at = 0 and 1 
otherwise. 

Because by Chebotarev’s Theorem, a a f 0 for all a the proof works for 
any square submatrix M, no matter what size. This gives us Theorem [3] 


7 .3 Proof of Theorem [4] 


We first need to assert the existence of a cyclical renumbering of the variables 
such that with respect to the new trivial partition A' = (A' k )£~q, the CI 
monomial is given by 

c1= n n < 7 - k 

k=0 j£A’ K 

in other words, if j G A' K then 0 < j — k < P. Note first that since min(A' K ) = 
£Eo for all j G A' K implies that j > J2i=o £ 'i- Therefore, it will suffice 
to find a 0 < 7 < P such that for all k, £' — k > 0 so that j — k > 

Er=o<-«>o. 

Let 0 < 7 < P be such that the quantity EEo ~ 7 is minimized, let 


= (4)E o 1 = (%+ 7 )mod p)f= o 1 


and let A' = (A' k )^_q be the corresponding trivial partition. Now fix k and 
assume that k + 7 < P. Then 


K—l 

E4 

K—l 

-« = E %+t) - k 


i= 0 

i= 0 



/«+7-1 \ 

/7”1 


= ( E £i _ +t) ) - 

£ 


' i= 0 ' 

' 1=0 


> 0 



since the second term in the difference is minimal. If/i + 7 >P+l then 
remembering that E^q 1 = T 
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k—1 k ,— 1 

^ ^ ^ = ^ ^ ^(i+7)mod P ^ 

i =0 2=0 

P-1 K+7-P-1 

= 4 + F ^ -« 

2=7 2=0 

P—1 7 —1 tt+ 7 — P— 1 

= F ~ F l-,: + F - K 

2=0 2=0 2=0 



> 0. 


In order to complete the proof, we must show that AiC 17 ) > A^C 1 ) for all 
a £ Sp/r with equality holding if and only if a is trivial. This will follow 
by direct calculation together with the following lemma which follows from 
a classical result on rearrangements of series ([2], Theorems 368, 369). This 
result is Lemma 3.3 in [341 . 

First, however, we adopt the following notation. For 0 < n < P, let b n = k 
if n £ A k . With this notation, given er £ Sp/T, 

p -1 

f | [n) — b n ) mod P 

n —0 

and under the above assumptions, 

p-i 

C 1 = n C (n-h„)' 
n=0 


Moreover, 


p-i 

A{C°) = F * 2 
2=0 
P-1 

= F * 2 (#( n: ( CT ( n ) — &n) mod, P = *}) 

2=0 

P-1 

= F (( CT ( n ) ~ M mod P) 2 . 

2=0 

Lemma 3. Given two finite sequences of real numbers ( a n ) and (fin) defined 
up to rearrangement, the sum 
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^ ^ /3 n 

n 

is maximized when a and {3 are both monotonically increasing or monotoni- 
cally decreasing. Moreover, if for every rearrangement a' of a, 

^ ^ P n £ ^ ^ 0L n f3 n 

n n 


then a and (3 are similarly ordered, that is, for every j, k, 

{OLj ~ OLk)(Pj ~ Pk) > 0. 

In particular, for every a G Sp, 


p -l p -l 

y] nb n > ^2 cr(n) b n 

n—0 n =0 

with equality holding if and only if a is trivial. 

Proof. The first part of the lemma is simply a restatement of Theorems 368 
and 369 of [U. To prove the second part, note first that b n is a non-decreasing 
sequence and in particular is constant on each A K . Theorem 368 in [lT 4| states 
that a sum of the form y cr{n) b n is maximized when cr(n) is monoton¬ 
ically increasing, which proves the given inequality. Since b n is constant on 
each A k , it follows that if a is trivial, then we have equality. 

If er is not trivial then we will show that the sequences cr(n) and b n are 
not similarly ordered. Letting k be the minimal index such that A K is not left 
invariant by a, there exists m £ A K such that a(m) £ A^ for some /j, > k, 
and for some A > n there exists k £ A\ such that a{k) £ A K . Therefore, 
b m = k < A = bk but since fi > n, a(m ) > cr{k), and so er(n) and b n are not 
similarly ordered. 

In order to complete the proof, define Ci, C 2 C {0, ..., P — 1} by n £ C\ if 
0 < a(n) — b n < P, and n £ C 2 if — P + 1 > cr(n) — b n < 0 (note that always 
|cr(n) — b n | < P) so that when n £ C 2 , (cr(n) — b n ) mod P = o{n) — b n + P. 
Let cr\n) = cr{n) if n £ C\ and cr(n) + P if n £ C 2 , and let {a n )^~J be an 
increasing sequence enumerating the set <j{C\) U (<r(C 2 ) + P). Therefore, 
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A(C a ) - A(C t ) 


P-1 


P-1 


Y WW) - b n f ~Y( n ~ 6 «) 2 

n =0 n =0 

P-1 P-1 

Y W( n ) ~ b n ) 2 ~Y( an ~ bn ^ 

n —0 

P-1 P-1 

Y (°« “ bnf -Y( n ~ bn ^ 


n—0 


+ 


L n—0 


r P-1 


n—0 
r P—1 


^ ^ ^n^n & (p)^n “1“ ^ ^ (^n ^n) (p ^n) 


■ n—0 

i+ ii. 


■ n—0 


Since a n is increasing, I > 0 by Lemma [3j and since a n > n for all n, 
(i a n — b n ) > {n — b n ) > 0 so that (a n — b n ) 2 > (n — b n ) 2 and hence II > 0. 
It remains to show that equality holds only if er is trivial. If A{C a ) = A(C T ) 
then 1 = 11 = 0. Since II = 0, C 2 = 0 for if a n € crW) + P then a n > n and 
we would have II > 0. Since C 2 = 0, cr'(n) = a(n) so that 

0 = A(C a ) - AiC 1 ) 

p -1 p-i 

= Y ( CT ( n )~ 6 «) 2 - J2 ( n ~ 

n=0 n—0 

P-1 

= 2 Y( nb n- v{n)b n ) 

n—0 

which by Lemma [3] implies that a is trivial. The proof is complete. 
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